Remote file transfer method and apparatus

ABSTRACT

Remote file transfer applications often involve a situation where a receiving computer ( 22 ) contains a reference file ( 48 ) that may be similar, or perhaps even identical to, a source file ( 46 ) to be transmitted by a sending computer ( 20 ). Disclosed is a file transfer method that identifies and isolates the differences between the two files, and transmits only those differences to the receiving computer. The method divides the data in the reference file into a plurality of blocks and associates each block of data with a key value. The key values are then sent to the sending computer in the form of an array. At the sending computer, a block of data at the source file is identified, its key value computed, and the key value is then compared to the keys in the array. If a match is found, an indication of such is sent to the receiving computer. Otherwise, a byte of data from the data block is sent to the receiving computer, and a subsequent block of data is identified and analyzed. The latter steps of the method are repeated until a representation of the source file is present at the receiving computer.

This is a continuation of application Ser. No. 08/182,969, filed Jan.14, 1994 now U.S. Pat. No. 5,446,888.

FIELD OF THE INVENTION

The present invention relates to computer communications in general and,in particular, to a method and apparatus for decreasing the timerequired to update files located at a remote computer.

BACKGROUND OF THE INVENTION

In computer communications technology, the rate of data communicationbetween a computer and other peripheral devices is very important. Theability to quickly and accurately transfer data between two personalcomputers is of special interest in light of the increased use ofportable computers. Often, data entered into a portable computer isultimately transferred to a user's home or office personal computer.Computer specialists are continually searching for communicationprotocols that decrease the time required to transfer data withoutcompromising the reliability of the data being transmitted.

A conventional method for conveying data between computers, especiallypersonal computers, involves the interconnection of a data bus in asending computer with a data bus in a receiving computer. This may bedone by coupling the serial, parallel, or similar communications portsof each computer through an interface link, such as a cable or across adata path using modems. In serial communication, data is transferred onebit at a time. Serial communications work well for transferring dataover long distances, and particularly with modems that couple twocomputers using a telephone line. However, the time required to transferdata using serial communications can be significant, especially forlarger files. When communicating between two devices that are relativelyclose, parallel communications are often used. Parallel communication isthe simultaneous transfer of a number of bits of data in parallel, e.g.,8-bit, using a multi-bit data path.

Computer software companies are continually investigating more efficientmethods of transferring data to reduce data transmission times. Twoprevalent areas of concentration have been on increasing data transferrates and on incorporating forms of data compression to reduce theamount of data being sent. Advances in data transfer rates have beenaccomplished by increasing the speed at which modems communicate inserial communication and by increasing the number of bits that can betransferred simultaneously in parallel communication. An exampletechnology that incorporates the latter technique is described in U.S.Pat. No. 5,261,060, titled “Eight-bit Parallel Communications Method andApparatus,” and assigned to the assignee of the present invention. U.S.Pat. No. 5,261,060 is hereby incorporated by reference. Data compressionschemes reduce the size of a file to be transmitted by various means ofcompacting information. For example, one common compression technique,called key-word encoding, replaces words that occur frequently, e.g.,the, with a 2-byte token representation of each word. After thecompressed data is received by a remote computer, the data isdecompressed to create a representation of the original contents of thefile.

A more recent approach to decreasing the time required to transfer afile has recognized that a receiving computer will often have a file,i.e., a reference file, that is similar or perhaps even identical to asource file to be transmitted. For example, the source file may simplyinclude text from the reference file with only a few words or sentenceschanged. Rather than sending an original or compressed representation ofthe entire source file, file transfer methods utilizing this approachidentify the differences between the two files, and then transfer onlythe differences to thee receiving computer. Upon receipt, the differenceinformation is used to update the reference file at the receivingcomputer, thereby reproducing a precise copy of the source file. Thepresent invention is directed toward an improved method of identifyingand transferring revisions between a source file and a reference file tocreate an accurate copy of the source file at a remote computer.

SUMMARY OF THE INVENTION

The invention is a file transfer method that identifies and isolates thedifferences between a source file located at a sending computer and areference file, located at a receiving computer, that may have datasimilar to the data comprising the source file. The computers areconnected through a computer data interface. The method includes thesteps of: (a) dividing the reference file into a plurality of datablocks and associating each data block with a key value representativeof the data in each block; and (b) identifying blocks of data at thesource file and using the key values to compare blocks of data from thereference file to blocks of data from the source file and, in instanceswhere a match is found between a block of data from each file, sendingan indication of the match to the receiving computer so that the blockof data indicated by the match need not be transmitted to the receivingcomputer.

In accordance with other aspects of the invention, the step ofidentifying blocks of data at the source file includes the step ofcomputing a source key for each block of data which is then compared tothe key values from the reference file.

In accordance with still further aspects of the invention, an initialblock of data is identified from the source file and a source key iscomputed from the initial block. If a match for the initial block is notfound, the method includes the steps of: (i) transmitting a byte of datafrom the initial block to the receiving computer; and (ii) identifying asubsequent block of data from the source file comprising the initialblock of data, less the transmitted byte, and a byte of data from thesource file.

In accordance with other aspects of the invention, the method includesthe step of transmitting the key values associated with data blocks inthe reference file to the sending computer. Further, the key value for ablock of data is computed by multiplying the bytes in the block by oneor more multipliers, the value of the multiplier being dependent uponthe position of a given byte in the block, and summing the results ofthe multiplication operations.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing aspects and many of the attendant advantages of thisinvention will become more readily appreciated as the same becomesbetter understood by reference to the following detailed description,when taken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram of a communications network including asending computer and a receiving computer, each running a file transferprogram that may be used to update files in accordance with theinvention;

FIG. 2 is a block diagram depicting the representation of a referencefile, that may have similarities to a source file to be transmitted,using a number of keys, with each key being associated with andrepresentative of a block of data in the reference file;

FIG. 3 is a block diagram illustrating the selection of blocks of dataat the source file on a sliding window basis;

FIG. 4 is a flow diagram of an exemplary routine for implementing a filetransfer program in accordance with the invention;

FIG. 5 is a flow diagram of a first exemplary subroutine for determiningkey values for each block of data in the reference file;

FIG. 6 is a flow diagram of an exemplary routine in accordance with theinvention for determining the differences in the source and referencefiles and transferring those differences to the receiving computer wherea destination file is created;

FIG. 7A illustrates a second exemplary method of determining key valuesfor each block of data in the reference file;

FIG. 7B is a flow diagram of a subroutine for implementing the methoddetermining key values shown in FIG. 7A; and

FIG. 7C is a flow diagram of a subroutine for determining the value of akey associated with a current block of data in the source file.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Remote file transfer applications often involve a situation where areceiving computer already contains a file that is similar, or perhapseven identical to, a file to be transmitted. For example, the file to betransmitted may be a revision of a text file with only a few words orsentences changed. The invention is a file transfer method thatidentifies and isolates the differences between the two files, andtransmits only those differences to the receiving computer. For similarfiles, the file transfer method can result in compression ratios far inexcess of those achieved by traditional data compression methods.

FIG. 1 illustrates a typical operating environment in which theinvention may be utilized. A sending computer 20 is coupled to areceiving computer 22 through a communications link 24. The computersare of a type generally known in the art, such as personal or laptopcomputers. The communications link may be any known means fortransferring data between the two computers, such as the LAPLINK® seriesof file transfer tools manufactured and sold by Traveling Software,Inc., the assignee of the present invention.

The sending computer 20 generally comprises a processing unit 26, amemory 28, and a number of communications ports 30. The memory,including random access memory (RAM), read only memory (ROM), andexternal systems memory, is connected to the processing unit 26 by adata/address bus 32. The communications ports are connected to theprocessing unit by a data bus 34. The communications ports 30 includeparallel and serial ports, as well as other input/output technologiesincluding PCMCIA card technology, that allow data to be sent andreceived by the sending computer. The receiving computer 22 is similarto the sending computer, and includes a processing unit 36, memory 38,communications ports 40, data/address lines 42, and a data bus 44.Although for ease of description one computer is called the sendingcomputer and the other is called the receiving computer, the computersare generally interchangeable.

In order to accomplish data transfer, the sending and the receivingcomputers include computer program controls that, for example, arestored in RAM and executed by the processing units of each computer. Inone embodiment of the invention, the sending and receiving computercontrols are combined into a single file transfer program 45 that isresident at each computer. In this manner, each computer can operate asa sending or receiving computer. Because of the requirements ofhandshaking, copies of the file transfer program 45 located at eachcomputer are preferably executed simultaneously. This allows forfull-duplex transmission, i.e., simultaneous communications in eachdirection. The invention may also be utilized in half-duplexcommunications, although not as efficiently.

For clarity in this discussion, throughout the Detailed Description itis assumed that a source file 46 located at the sending computer is tobe sent to the receiving computer 22. Further, it is assumed that thereceiving computer includes a reference file 48 that includes at leastsome similarities to the source file. Once a user indicates that asource file is to be transferred, a reference file that may have datathat is similar to the source file is identified by, for example, havinga file name that is the same or similar to the source file. Theinvention described herein generally assumes that a reference file hasbeen identified. The basic steps implemented by the file transferprogram are as follows:

(1) identifying a reference file at the receiving computer that may havedata similar to data comprising the source file;

(2) dividing the data comprising the reference file into a plurality ofdata blocks having n-bytes per block and associating each data blockwith a key value;

(3) transmitting the key values from the receiving computer to thesending computer;

(4) identifying a current n-byte block of data from the source file andcomputing a value for a source key associated with the current block ofdata;

(5) comparing the value of the source key with each of the key valuesfrom the reference file and, if a match is found, (i) transmitting anindication of the match to the receiving computer, and (ii) repeatingstep (4); and

(6) if a match was not found, transferring to the receiving computer abyte of data from the current block of data, adding an additional byteof data from the source file to the current block of data, re-computingthe value of the source key, and repeating step (5).

Generally, the loops created by steps (5) and (6) repeat until all ofthe data in the source file has been considered. At the receivingcomputer, a destination file is created from the match indications andthe byte transmissions. The destination file will be a duplicate of thesource file upon completion of transmission.

FIG. 2 illustrates pictorially step (2), which includes dividing thedata comprising the reference file into a plurality of data blocks 50_(a), 50 _(b), 50 _(c) . . . 50 _(y), 50 _(z) and associating each datablock with a key value 52 _(a), 52 _(b), 52 _(c) . . . 52 _(y), 52 _(z).It is noted that the last block of data may include less than n-bytes,and thus is indicated as having x-bytes. Once the reference file isseparated into the data blocks 50, the key value 52 of each block may becomputed using a number of methods. In a first exemplary embodiment,each key is computed by adding the value of each byte of data in theblock to produce a total of all of the bytes in the block. By way ofbackground, each 8-bit character in any given block is representative ofan ASCII value that ranges from 0 to 255, i.e., 2⁸−1. ASCII is anacronym for the American Standard Code for Information Interchange, acoding scheme that assigns numeric values to letters, numbers,punctuation marks, and certain other characters. Through thestandardization of values used for such characters, ASCII enablescomputers and computer programs to exchange information. Calculating thekeys as described above will produce a whole number between zero and n²that is representative of the data contained in any given block.

Once the keys for each block of data in the reference file have beencomputed, the keys are sent as an array to the sending computer forcomparison to the source file.

FIG. 3 is a pictorial representation of steps (3)-(5). In step (3), acurrent n-byte block of data from the source file is identified and avalue for a source key associated with the current block of data iscomputed. Thus, in the first comparison bytes zero through (n−1) areidentified as the current block of data. Thereafter, the key value ofthe current block of data is computed using the same method that wasused to compute the keys in the reference file. The key value for thecurrent block of data is labeled KEY1.

In step (4), the value of KEY1 is compared to each of the keys in thereference file to determine whether a match has been found, therebyindicating that the current block of data is identical to a block ofdata in the reference file. If a match is found, an indication of suchis sent to the receiving computer. Assuming a match has not been found,according to step (5) the first byte in the current block (byte zero) issent to the receiving computer. A subsequent “current” block of data isthen evaluated by subtracting the first byte of data (byte zero) fromthe current block, adding the next sequential byte of data (byte n) tothe current block, and recomputing the key value for the subsequentcurrent block. The key value for this block of data is labeled KEY2.Thus, KEY2 will comprise the values of bytes 1 through n. The value ofKEY2 is then compared to each of the keys in the key array for thereference file.

Assuming a match is not made, the first byte in the current block (byte1) is sent to the receiving computer. A third key KEY3 representing thecurrent block of data is then computed by sliding the current block ofdata one byte to the right, such that KEY3 comprises the values of bytes2 through (n+1). This will continue until either a match is foundbetween a key value computed from a data block in the source file and akey in the key array for the reference file, or all of the data in thesource file has been transmitted. Assuming a match is found, anindication of such is sent to the receiving computer, and a subsequentcurrent block of data is computed from the source file.

It is noted that in the case where a match is not found, the additionaltime required to transfer a file, in comparison to traditional methods,is negligible despite the sliding window and key computations. This is,in part, due to the fact that the processor can make computations muchfaster than data can be sent. Further, in a preferred embodiment, thereceiving computer is configured to expect that bytes being received aredata bytes and are not indicative of a match between two data blocks. Inthe latter case, an additional “match-indicator” byte is sent ahead ofthe byte(s) indicating that a match has occurred. Thus, the number ofbytes being sent in the case of no matches will generally beapproximately the same as if the data were simply transmitted withoutany opportunity for match checks in accordance with the invention.

The foregoing is an overview of an exemplary embodiment of the filetransfer program 45. Exemplary routines for implementing the filetransfer program is software are set forth in FIGS. 4-6 and accompanyingtext. In that regard, FIG. 4 is a flow diagram of a routine forcomputing a key array from the contents of the reference file. The sizeof each block of data is set at block 100. In one embodiment, each blockcontains 256 bytes. At block 102, the variable nBlock, representing thecurrent block of data being considered by the routine, is set to zero.The value of the key for the current block of data is computed at block104. A suitable routine for computing the key value is illustrated inFIG. 5. At block 106, the array BlockKey[nBlock] is set equal to thevalue of the key computed for the current block. The variable nBlock isthen incremented at block 108. A test is made at block 110 to determinewhether the end of the reference file has been reached. If the end ofthe file has not been reached, the routine loops back to block 104. Ifthe end of the file has been reached, the BlockKey array is sent to thesending computer at block 112 and the routine terminates.

FIG. 5 is a flow diagram of a first exemplary subroutine suitable foruse in FIG. 4 (block 104) for computing the value of the key associatedwith a given block of data. The subroutine will be called for each datablock in the reference file. At block 120, the variable “n,” which isrepresentative of the byte count, is set equal to zero. At block 122,the variable “key” is also set equal to zero. A byte of data is thenread from the reference file at block 124. Variable n is incremented atblock 126. At block 128, the key is set equal to its previous value plusthe value of the current byte of data that was read at block 124.

A test is made at block 130 to determine whether the end of thereference file has been reached. If the end of the reference file hasnot been reached, a test is made at block 132 to determine whether afull block of data has been considered, i.e., whether n is equal to theblock size. If n is not equal to the block size, the subroutine loopsback to block 124. If n is equal to the block size, or if it wasdetermined at block 130 that the end of the file was reached, thesubroutine terminates, and control returns to the routine of FIG. 4.

FIG. 6 is a flow diagram of a routine for comparing keys associated withn-byte blocks of data from the source file with the keys computed fromthe reference file and contained in the BlockKey array. At block 150,the variable “current key” is set equal to zero. A test is made at block152 to determine whether there is at least n-bytes of data in the sourcefile that have yet to be compared. If there are at least n-bytes of datanot yet compared, at block 154 an n-byte block of data is read from thesource file. At block 156, the value of the current key, representingthe current block of data, is computed using the same computationmethods that were used in FIG. 5, i.e., by adding the weighted value ofeach byte in the current block of data. The key values in the BlockKeyarray are then searched at block 158 to determine whether any of thekeys in the BlockKey array match the current key. The test for whether amatch is found is performed at block 160.

If a match was found at block 160, a message is sent at block 162 to thereceiving computer to emit the matching block to the destination file.The routine then loops to block 154. If a match was not found, at block164 a first byte of data in the current block is sent to the receivingcomputer. At block 166, the byte of data that was sent to the receivingcomputer is removed from the current block of data. A test is then madeat block 168 to determine whether there is any data remaining in thesource file that has not been considered. If there is data remaining inthe source file, a new byte of data is read from the source file atblock 170 and added to the current block of data at block 172. Theroutine then loops to block 156 where the key for the current block ofdata is computed.

Those skilled in the art will appreciate that, in computing the currentkey in block 156 after looping from block 172, it is more efficient toobtain the value of the key for the current block by subtracting fromthe previously-computed current key the value of the byte that wasremoved from the current block (in block 166) and then adding the valueof the byte that was added to the current block (in block 172), ratherthan performing the key calculation by adding every character in thecurrent block.

If all of the data remaining in the source file has been considered, orif there were less than n-bytes of data remaining in the source file asdetermined in the test at block 152, the data remaining in the sourcefile is sent to the receiving computer at block 174. At the receivingcomputer, the transmitted data is added to the destination file and thefile transfer is complete. In an alternative embodiment, instead ofsimply sending the data remaining as indicated in block 174, a test ismade to determine if the key value of the data remaining matches the keyvalue of the last block of data in the reference file. If a match ispresent, an indication of the match is sent to the receiving computer,and the data itself need not be transmitted. Otherwise, the actual datais transmitted. This will result in a further optimization of thetransfer in situations where the end of the reference file contains thesame data as the end of the source file.

Once all of the data in the source file has been added to thedestination file, i.e., through blocks 162, 164 and 174, the destinationfile will in most circumstances be an exact copy of the source file.However, it is preferable that a check be made to ensure that thedestination file is indeed a precise duplicate of the source file. Inblock 176, the integrity of the destination file is checked using meansknown to those skilled in the art. One method of checking the fileintegrity is a cyclic redundancy-check (CRC), such as that set forth inM. Nelson, The Data Compression Book, 446-448 (M&T Books 1991), which ishereby incorporated by reference. If the integrity of the destinationfile was compromised, the data from the source file is retransmitted tothe destination file using conventional transmission methods. This isindicated at block 178. If the integrity of the destination file testedpositive, or upon transmitting the source file, the routine terminates.

One circumstance where the destination file may not be an accurate copyof the source file is where two or more different blocks of data yieldthe same key value. If it is assumed that each block of data is 256bytes, under the key computation method described in FIG. 5, the rangeof possible key values is 0 to 65,280, the latter value occurring onlyif each byte in the block has a numerical value of 255. The odds ofhaving duplicate keys are significantly decreased if: (1) the range ofpossible key values is relatively large, and/or (2) the likelihood thatkey computations will fall within a broader portion of the range isincreased. In light of the above, it will be appreciated that theaccuracy of the data file transfer method for transmitting data inaccordance with the invention will work most effectively if thepossibility of having two different blocks of data having the same keyvalue is extremely remote.

Another desirable feature of an advantageous key computation method isif the current key values for blocks of source data that are derived ona sliding window basis can be quickly established. One way ofaccomplishing this is to have a key computation method that allows thecurrent key to be updated by subtracting the key value associated withthe byte of data to be subtracted from the current block of data (block166) and adding the key value associated with the byte of data to beadded to the current block (block 172). While the key computation methoddescribed in FIG. 5 has this desirable feature, it may not work well forlarger files because of its limited range of possible key values and thedistribution of values within this range.

FIGS. 7A-7C illustrate a second exemplary embodiment for calculatingkeys in accordance with the invention in which the range of possible keyvalues is extended beyond the summing scheme of FIG. 5, therebydecreasing the likelihood that any key value will be representative ofmore than a single block of data. Further, the calculation method allowsthe current key to be updated very quickly, as described in FIG. 7C andaccompanying text. The examples of FIGS. 7A-7C illustrate a 32-bit key,but it will be appreciated that other key sizes may be implemented.

With reference to FIG. 7A, the 32-bit key is divided into a lower 24-bitsegment and an upper 8-bit segment. The 24-bit segment is computed usingthe following equation:C₁(n)+C₂(n−1)+C₃(n−2)+. . . C_(n−1)(2)+C_(n)   (1)where C_(i) is the character in the ith position of a current block andn is the number of bytes in each block. The upper 8-bits of the 32-bitkey are calculated by performing an exclusive OR operation (XOR) on eachof the characters, as shown by the equation:C₁ XOR C₂ XOR C₃ . . . C_(n−1) XOR C_(n)   (2)

Once the lower and upper key values are calculated, the bits areconcatenated to form each 32-bit key.

FIG. 7B illustrates a suitable subroutine for implementing the keycalculations illustrated in FIG. 7A. The subroutine is called by theroutine of FIG. 4 in lieu of calling the subroutine of FIG. 5. At block200, the variable “n,” which is indicative of the byte count for anygiven block, is set equal to zero. At block 202, the lower and upperportions of the key, i.e., key.24 and key.8, are set equal to zero. Thevariable “sum” is set equal to zero at block 204. At block 206, a byteof data is read from the reference file.

At block 208, the value of the current byte is added to the variablesum. The key.24 variable is then increased by the value of sum at block210. It will be appreciated that blocks 208 and 210 are alternatemethods of computing equation (1) without requiring multiplicationoperations. At block 212, the variable key.8 is set equal to theprevious value of key.8 XOR the current byte.

A test is made at block 214 to determine if the end of the referencefile has been reached. If the end of the reference file has not beenreached, a test is made at block 216 to determine whether n is equal tothe block size. If n is not equal to the block size, the routine loopsto block 206. If n is equal to the block size, or if the end of the filehas been reached, the variable “key” is set by concatenating the lower24-bit (key.24) value computed at block 210 with the upper 8-bit (key.8)value computed at block 212. The subroutine then terminates, and theprogram returns to block 106 of FIG. 4.

FIG. 7C illustrates a suitable subroutine that may be called from block156 of FIG. 6 to compute the value of the current key for a currentblock of data. The subroutine illustrates the optimization of keycalculations for those blocks of source data that are identified on asliding window basis, and thus have key values similar to a key that hasalready been computed. A test is made at block 220 to determine if thecurrent block was identified on a sliding window basis, or in otherwords, if the subroutine was called because a match with the previousblock was not found. If the current block of data was not identified onthis basis, the subroutine of FIG. 7B to compute the value of thecurrent key. This occurs at the first block of data in the source fileor after a match was found.

If the current block of data was identified on the basis of a previous,unmatched block of data, the first byte from the previous block (termedthe removed byte from the operation of block 166) is subtracted from sumat block 224. At block 226, the key.24 is set equal to its previousvalue less the product of the block size times the value of the removedbyte. At block 228, the new byte (added to the current block in block172) is added to sum. The key.24 variable is then increased by the valueof sum at block 230.

At block 232, the variable key.8 is set equal to the previous value ofkey.8 XOR the removed byte. An exclusive OR operation is then performedbetween key.8 the new byte. At block 236 the variable “key” is set byconcatenating the lower 24-bit (key.24) value computed at block 230 withthe upper 8-bit (key.8) value computed at block 234. The subroutine thenterminates, and control returns to block 158 of FIG. 6. As can be seen,the subroutine of FIG. 7C allows key values to be quickly computed, thusallowing faster operation of the file transfer program when looking formatches between blocks from the source file and blocks from thereference file.

As will be appreciated by those skilled in the art, a large number ofdifferent key computation methods may be used in accordance with theinvention. Thus, the invention is not to be limited by exemplary keycalculations illustrated herein. Any key computation that is notunnecessarily time-consuming computationally and that provides arelatively wide range of results may be beneficial. Moreover, the typeof key computation used in any particular embodiment may depend upon theblock size to achieve optimal results. Another key computation that maybe used is to multiply each character in a block by its position in theblock, and summing the results of the multiplication operations. Anotherkey computation that may be implemented is the CRC file integrity checkdiscussed above. Although this method is extremely accurate, it may betoo slow for many applications.

With reference again to FIG. 6, it is noted that further optimizationmay be achieved when searching the BlockKey array (block 158) byutilizing a binary search. A binary search is a type of search in whichan item that may be present within an ordered list is found byrepeatedly dividing the ordered list into two equal parts and searchingthe half that may contain the item. Because a binary search requires thesearching list to be in a known sequence, e.g., ascending order, theBlockKey array would need to be arranged accordingly in order for thesearch to be effective. A suitable standard binary search is set forthin H. Schildt, The Complete C Reference, 487-488 (Osborn McGraw-Hill1987), which is hereby incorporated by reference.

While the preferred embodiment of the invention has been illustrated anddescribed, it will be appreciated that various changes can be madetherein without departing from the spirit and scope of the invention.

1. A method of transmitting data from a source file located at a sendingcomputer to a receiving computer, the computers being connected througha computer data interface, the method comprising the steps of: (a)dividing a reference file located at the receiving computer into aplurality of data blocks, each data block having a length of n bytes,and associating each data block with a reference key value determined inaccordance with a key defining method by the data in that block; (b)transmitting the reference key values to the sending computer; (c)identifying blocks of data of length n bytes from the source file,determining source key values in accordance with the key definingmethod, and using the source and reference key values to compare blocksof data from the reference file to blocks of data from the source fileand, in instances where a match is found between a block of data fromeach file, sending an indication of the match to the receiving computerso that the block of data indicated by the match need not be transmittedto the receiving computer.
 2. The method of claim 1 wherein an initialblock of data is identified from the source file, a source key isdetermined from the initial block, and if a match for the initial blockis not found; the method further including the step of transmitting asubset of the initial block to the receiving computer, the subsetincluding less than all of the information in the initial block; andidentifying blocks of data of length n bytes from the source fileincludes identifying from the source file a subsequent block of data oflength n bytes comprising the initial block of data, less thetransmitted subset, and additional data from the source file.
 3. Themethod of claim 1 wherein at least a portion of the key value for ablock of data is computed by adding the value of each byte of data inthe block to produce a total of all of the bytes in the block.
 4. Themethod of claim 1 wherein at least a portion of the key value for ablock of data is computed by multiplying the bytes in the block by oneor more multipliers, the value of the multiplier being dependent uponthe position of a given byte in the block, and summing the results ofthe multiplication operations.
 5. A method of transmitting data from asource file located at a sending computer to a destination file locatedat a receiving computer, the computers being connected through acomputer data interface, the method comprising the steps of: (a)identifying a reference file at the receiving computer that may havedata identical to some of the data comprising the source file; (b)dividing the data comprising the reference file into a plurality of datablocks having n-bytes per block and associating each data block with areference key value determined by a key defining method; (c) identifyingan n-byte block of data from the source file and computing using the keydefining method a current value for a source key associated with theidentified block of data; (d) comparing the current value of the sourcekey with each of the reference key values and, if a match is found, (i)transferring an indication of such to the receiving computer, and (ii)repeating step (c); and (e) if a match was not found in step (d),transferring to the receiving computer a subset including less than allthe data in the n-byte block of data, removing the subset from then-byte block of data, adding additional data from the source file to then-byte block of data, re-computing using the key defining method acurrent value of the source key, and repeating step (d).
 6. The methodof claim 5, wherein steps (c) and (d) are repeated only until all of thedata in the source file has been considered.
 7. The method of claim 5wherein recomputing a current value of the source key of step (e)includes deriving at least a part of the current value of the source keyfrom at least a part of the previous source key by removing thecontribution to the part of the source key from the transmitted subsetand integrating into the part of the source key a contribution from theadditional data.
 8. The method of claim 1 wherein each byte of data inthe reference file is used in the determination of not more than onereference key, and in which at least some of the bytes of data in thesource file are used in the determination of multiple source keys. 9.The method of claim 2 in which the source key and the reference keyinclude multiple bits and in which some of the bits are determined by asumming operation and some of the bits are determined by a logicaloperation.
 10. The method of claim 9 in which the summing operationincludes multiplying by constant coefficients the values represented bybytes of the blocks of source data and in which the logical operationcomprises an exclusive OR operation.
 11. The method of claim 2 whereinthe key defining method for the blocks of data includes the followingcalculation:C₁(n)+C₂(n−1)+C₃(n−2)+. . . +C_(n−1)(2)+C_(n) where C_(n) is thecharacter in the nth position of the block of data.
 12. The method ofclaim 11 wherein the key defining method includes the following logicaloperation:C₁ XOR C₂ XOR C₃ . . . C_(n−1) XOR C_(n).
 13. A method of changing dataat a receiving unit until the data at the receiving unit is identical todata at a source unit, comprising: (a) determining multiple referencekeys corresponding to groups of data stored at the receiving unit; (b)transmitting the multiple reference keys to the source unit; (c)determining a source key corresponding to a group of source data in thesource unit; (d) comparing the source key with the multiple referencekeys; (e) transmitting data from the source unit to the receiving unitif the source key does not match any of the reference keys; (f)transmitting a control signal from the source unit to the receiving unitif the source key matches a reference key, the control signal causingthe receiving unit to use data at the receiving unit corresponding tothe matched reference key; and (g) repeating steps (c), (d), (e), and(f) for additional groups of source data in the source unit until thedata at the receiving unit is identical to the data at the source unit.14. The method of claim 13 wherein the data transmitted is a subset ofthe group of source data associated with the matching source key, thesubset including less than all of the information in the initial block .15. The method of claim 13 wherein each byte of data in the referencefile receiving unit is used in the determination of not more than onereference key, and in which at least some of the bytes of the data inthe source file unit are used in the determination of multiple sourcekeys.
 16. An apparatus for changing data at a receiving unit until thedata at the receiving unit is identical to data at a source unit,comprising: means for determining an array of reference keyscorresponding to groups of data stored at the receiving unit; datatransfer means for transmitting the multiple reference keys to thesource unit; means for determining source keys corresponding to groupsof source data in the source unit; means for comparing the source keyswith the multiple reference keys; means for transmitting data from thesource unit to the receiving unit when a source key does not match anyof the reference keys; and means for transmitting a control signal fromthe source unit to the receiving unit when a source key matches areference key, the control signal causing the receiving unit to use agroup of data at the receiving unit corresponding to the matchedreference key.
 17. The apparatus of claim 16 wherein the means fordetermining source keys determines a new source key after the means forcomparing source keys has compared the previously determined source key,and wherein the means for determining source keys determines the newsource key from a group of source data, the composition of the which isdetermined by whether the previously compared source key matched areference key.
 18. A method of creating at a receiving computer aduplicate file that is identical to a source file at a sending computer,the duplicate file being formed in part from data in a reference filelocated at the remote receiving computer and in part from data in thesource file transmitted from the sending computer, the computers beingconnected through a computer data interface, the method comprising thesteps of: (a) dividing the reference file located at the receivingcomputer into a plurality of data blocks of uniform length andassociating with each data block a reference key value determined by thedata in that block in accordance with a key defining method; and (b)identifying blocks of data of the uniform length from the source file,determining source key values in accordance with the key definingmethod, and comparing the source and reference key values to determinewhether blocks of data from the reference file match blocks of data fromthe source file and, in instances where a match is found between a blockof data from each file, sending an indication to the receiving computerto copy the block of data from the reference file into the duplicatefile so that the block of data indicated by the match need not betransmitted to the receiving computer, wherein the blocks of data fromthe source file are sequentially identified and each source block ofdata includes some of the data from the preceding source block of dataif the preceding source block of data did not match a reference block ofdata.
 19. The method of claim 18 in which the uniform length of the datablocks is at least 256 bytes.
 20. The method of claim 18 in which thekey defining method defines keys that are at least 32 bits in length.21. A method of creating at a receiving computer a duplicate file thatis identical to a source file at a sending computer, the duplicate filebeing formed in part from data in a reference file located at the remotereceiving computer and in part from data in the source file transmittedfrom the sending computer, the computers being connected through acomputer data interface, the method comprising the steps of: (a)dividing the reference file located at the receiving computer into aplurality of data blocks of uniform length and associating each datablock with a reference key value determined by the data in that block inaccordance with a key defining method; (b) identifying blocks of data ofthe uniform length from the source file, determining source key valuesin accordance with the key defining method, and using the source andreference key values to compare blocks of data from the reference filewith blocks of data from the source file; (c) in instances where a matchis found between a block of data from each file, sending an indicationof the match to the receiving computer to copy the block of data fromthe reference file to the duplicate file so that the block of dataindicated by the match need not be transmitted to the receivingcomputer; and (d) in instances where a match is not found between ablock of data from each file, transmitting less fewer bytes than thenumber of bytes in the uniform length from the source file to thereceiving computer and adding transmitted bytes to the duplicate file.22. A method of creating at a receiving computer a duplicate file thatis identical to a source file at a sending computer, the duplicate filebeing formed in part from data in a reference file located at the remotereceiving computer and in part from data in the source file transmittedfrom the sending computer, the computers being connected through acomputer data interface, the method comprising the steps of: (a)dividing the reference file located at the receiving computer into aplurality of data blocks, each data block having a length of n bytes,and associating each data block with a reference key value determined bythe data in that block in accordance with a key defining method; (b)identifying blocks of data of length n bytes from the source file,determining source key values in accordance with the key definingmethod, and using the source and reference key values to compare blocksof data from the source file with blocks of data from the reference fileto find a match; (c) in instances where a match is found between a blockof data from each file, sending an indication of the match to thereceiving computer to copy the block of data from the reference file tothe duplicate file so that the block of data indicated by the match neednot be transmitted from the source sending computer to the receivingcomputer; and (d) in instances where a match is not found: (i)transmitting a subset of the an initial block to the receiving computerand adding the subset to the duplicate file; (ii) identifying from thesource file a subsequent block of data of length n bytes comprising theinitial block of data, less the transmitted subset, and additional datafrom the source file; and (iii) determining for the subsequent block ofdata a source key, the source key being derived from the source keydetermined from the initial block of data by removing the contributionfrom the transmitted subset and incorporating a contribution from theadditional data.
 23. The method of claim 13 wherein if the precedingsource key did not match a reference key, the subsequent source key isdetermined from a group of source data that includes some but not all ofthe data in the preceding group of source data and also includes datanot included in the preceding group of source data and, if the precedingsource key did match a reference key, the source key corresponds to ablock of source data that directly follows the data used to determinethe previous source key.
 24. A method of changing data at a receivingunit until the data at the receiving unit is identical to data at asource unit, comprising: (a) determining using a key defining methodmultiple reference keys corresponding to data groups of length n bytesstored at the receiving unit; (b) transmitting the multiple referencekeys to the source unit; (c) determining using the key determiningmethod a source key corresponding to a group of source data of length nbytes in the source unit; (d) comparing the source key with the multiplereference keys; (e) transmitting data from the source unit to thereceiving unit if the source key does not match any of the referencekeys; (f) transmitting a control signal from the source unit to thereceiving unit if the source key matches a reference key, the controlsignal causing the receiving unit to use data at the receiving unitcorresponding to the matched reference key; and (g) repeating steps (c),(d), (e), and (f) for additional groups of source data in the sourceunit until the data at the receiving unit is identical to the data atthe source unit, wherein the groups of source data comprise, if thepreceding source key did not match a reference key, n−1 bytes from thefirst group of data and one additional byte of data, and if thepreceding source key did match a reference key, n bytes of datadifferent from the n bytes of the preceding source group of data.
 25. Anapparatus for changing data at a receiving unit so that the data at thereceiving unit is identical to data at a source unit, comprising: meansfor determining using a key defining method an array of reference keyshaving lengths of corresponding to data groups having a uniform lengthof at 256 bytes and stored at the receiving unit; data transfer meansfor transmitting the multiple reference keys to the source unit; meansfor determining using the key defining method source keys correspondingto groups of source data of the uniform length in the source unit; meansfor comparing the source keys with the multiple reference keys; meansfor transmitting, when a source key does not match any of the referencekeys, less than all the data that is included in the group of sourcedata used to determine the source key; and means for transmitting acontrol signal from the source unit to the receiving unit when a sourcekey matches a reference key, the control signal causing the receivingunit to use a group of data at the receiving unit corresponding to thematched reference key.
 26. A method of creating at a receiving computera duplicate file that is identical to a source file at a sendingcomputer, the duplicate file being formed in part from data in areference file located at the receiving computer and in part from datain the source file transmitted from the sending computer, the computersbeing connected through a computer data interface, the method comprisingthe steps of: (a) identifying a reference file at the receiving computerthat may have data identical to the data comprising the source file; (b)dividing the data comprising the reference file into a plurality of datablocks having n-bytes per block and associating each data block with areference key value determined by a key defining method; (c)transmitting the reference key values from the receiving computer to thesending computer; (d) identifying an n-byte block of data from thesource file and computing using the key defining method a current valuefor a source key associated with the identified block of data; (e)comparing the current value of the source key with each of the referencekey values and, if a match is found, (i) transmitting an indication ofsuch to the receiving computer, which adds the matching data from thereference file to the duplicate file, and (ii) repeating step (d); and(f) if a match was not found in step (e), transferring to the receivingcomputer a subset of the n-byte block of data to be added to theduplicate file and repeating step (d).
 27. The method of claim 26 inwhich, if a match was not found in step (d), a new block of data in thesource file is defined by removing the transmitted subset from theprevious n-byte block of data, adding additional data from the sourcefile to the new n-byte block of data, re-computing using the keydefining method a current value of the source key, and repeating step(e).
 28. The method of claim 26 in which, if a match is found in step(d), a new block of data in the source file is defined by the n bytesn-bytes immediately following the n-bytes used to form the previoussource block of data.
 29. The method of claim 26 in which the referencekeys comprise and a first part and a second part, the calculation ofeach part being independent of the calculation of the other part. 30.The method of claim 26 in which the uniform length of the data blocks isat least 256 bytes.
 31. The method of claim 26 in which the key definingmethod defines keys that are at least 32 bits in length.
 32. A method ofmaking creating a first reference data file at a first location that isidentical to a source data file at a second location, the methodcomprising the steps of: (a) identifying a reference file at the firstlocation that may have data identical to the data comprising a sourcedata file; (b) dividing the data comprising the reference file into aplurality of data blocks having n-bytes per block and associating eachdata block with a reference key value determined by a key definingmethod; (c) transmitting the reference key values from the firstlocation to the second location; (d) identifying an n-byte block of datafrom the source data file and computing using the key defining method acurrent value for a source key associated with the identified block ofdata; (e) comparing the current value of the source key with each of thereference key values and, if a match is found, (i) transferring anindication of such to the receiving computer first location, which addsthe matching data from the reference file to the a duplicate file, and(ii) repeating step (d); and (f) if a match was not found in step (e),transferring to the receiving computer first location a subset of then-byte block of data to be added to the duplicate file and repeatingstep (d).
 33. A method of changing data stored at a receiving unit tomatch data stored at a source unit, comprising: determining multiplefirst keys corresponding to groups of data stored at a first unit;determining a second key corresponding to a group of data stored at asecond unit; comparing the second key with the multiple first keys;designating which of the first unit and the second unit are the sourceunit and the receiving unit; and transmitting from the designated sourceunit to the designated receiving unit data corresponding to the secondkey if the second key matches none of the multiple first keys, andleaving unchanged in the designated receiving unit the datacorresponding to the second key if the second key matches one of themultiple first keys.
 34. The method of claim 33 wherein the comparingthe second key with the multiple first keys takes place in the secondunit, and further comprising transmitting the multiple first keys fromthe first unit to the second unit.
 35. The method of claim 34 whereinthe first unit is the designated receiving unit.
 36. The method of claim33, further comprising: successively determining different second keyscorresponding to different groups of data stored at the second unit;successively comparing each of the different second keys with themultiple first keys; and transmitting from the designated source unit tothe designated receiving unit data corresponding to each of thedifferent second keys that matched none of the multiple first keys. 37.A method of transmitting data from a source computer to a receivingcomputer, the source and receiving computers being connected through acomputer data interface, comprising: dividing a first file into multipledata blocks and associating each data block of the multiple data blockswith a first key value determined in accordance with a key definingmethod by the data in the data block; identifying multiple data blocksfrom a second file and determining second key values in accordance withthe key defining method; using the first and second key values tocompare data blocks from the first file and from the second file;designating which of the first file and the second file are located atthe source computer and at the receiving computer; and for instances inwhich a match is found between a data block from the first file and adata block from the second file, leaving unchanged the data block storedin the designated receiving computer.
 38. The method of claim 37 whereina selected data block from the first file is identified and a selectedfirst key from the selected data block is determined and, for instancesin which no match is found between a data block from the first file anda data block from the second file, the method further comprising:transmitting to the designated receiving computer a subset of theselected data block from the first file, the subset including less thanall of the information in the selected data block; and identifying fromthe first file a subsequent data block comprising the selected datablock less the subset transmitted to the designated receiving computer,and additional data from the first file.
 39. The method of claim 37wherein each of the multiple data blocks from the first and second filesincludes multiple bytes of data of which each byte has a value, andwherein at least a portion of the key value for a data block from anyone of the first and second files is computed by adding the value ofeach byte of data in the data block to produce a total for all of thebytes in the data block.